Features

Study Design

Don’t ruin a perfectly good product with a perfectly flawed study design

September 7, 2011

By: Debbie Wilkerson

Ph.D. and Gizelle Baker

“I don’t care what the numbers say . . . does it make sense?”

—Daniel Herschlag, Ph.D., Stanford University

The Food and Drug Administration (FDA) is undergoing a change in how it is reviewing drugs and devices, based on many drug and device recalls, as well as multiple publications addressing some of the issues with approved drugs/devices.

How do drug and device developers prepare for the upcoming changes? One area that sponsors should start addressing sooner rather than later is how they design clinical studies.

Appropriately designed clinical trials can answer the objectives of the study, while reducing the confounding and bias in the results, thus reducing the overall time and costs of implementing a trial.

No two studies are exactly alike, especially when you compare device studies with drug studies; however, there are many general principles that carry across studies for both classes of products. The similarities in study design are generally related to the objective of the study and the questions the study is designed to answer, rather than whether the investigational product is a drug or a device. The principles of study design are discussed in the domestic and international regulations: ICH E6, ICH E8, FDA 21 CFR 312, FDA 21 CFR 812, ISO 14155:2011). This article will cover some of the key principles of study design and discuss how they may impact the results and interpretation of study results and the approvability of an investigational product.

There are many examples of drugs and devices that were delayed during the approval process because the pivotal studies were not designed to answer the questions required for approval. Two recent ones listed below are:

Feb. 28, 2011: The FDA did not accept a marketing application for Rhucin (recombinant human C1 inhibitor) as it “did not provide data for a sufficient number of subjects to support the proposed dose.”
Feb. 26, 2011: The FDA asked for more information on a potential treatment for Gaucher disease, specifically requesting “more data from two studies…”

It is hard to imagine, with all the regulations and guidances that currently exist, why there are still major delays and study failures related to study design, but clinical trials are designed for humans, not chemicals in a test tube or animals in laboratories, and you can’t control for all of the human factors in a study, such as:

Genetic and environmental differences
Dietary and behavioral differences
Comorbidities and medical treatment difference
Compliance and participation differences

Therefore we need to design clinical studies to control what can be controlled, to account for or adjust for what cannot be controlled, and to eliminate as many potential sources of bias as possible.

A recent Datamonitor report (2009) stated, “Approximately 58% of Phase III clinical trials are unsuccessful, with the primary cause for failure being the inability of the data to demonstrate efficacy of the tested product against placebo.”

There are many reasons why studies might not show statistical significance, the first and most obvious being that the product is ineffective; sometimes the product just doesn’t work as well as projected. However, a properly designed study may help identify an ineffective product earlier in the development phase, and the sooner it can be determined that a product is ineffective, the better it is because fewer patients are exposed and the developer may be able to redirect the time, resources and money to an effective product.

Another reason that studies fail to show statistical significance is because the study is poorly designed and/or implemented, introducing bias and/or confounding into the results. Examples of this include: underpowering the study, enrolling an inappropriate patient population, including a poor comparison/control group, or studying the wrong dosing/treatment regimen.

The most important steps when designing a study are determining what questions the study is going to answer and what purpose the results will have (including who the audience will be).

The design of a study will be very different if the purpose is to generate hypotheses, identify the mechanism of action, or serve as pivotal data for the regulatory approval of a product.

There are a number of basic types of study design, such as retrospective case-control, prospective cohort, crossover, and the randomized controlled trial. The gold standard for providing “proof” is the randomized controlled trial, as this design eliminates many potential sources of bias and confounding, allows for the determination of a causal relationship, and can be used to assess multiple outcomes within the same trial. The disadvantages of a randomized controlled trial include that it is the most costly, takes the longest time, isn’t feasible for rare events or diseases with long lag times, includes restrictive sampling and therefore is not truly generalizable, and may be complicated by difficult physician/patient recruitment.

A poor study design cannot be salvaged by good statistics. The data are the data and the results are the results, no matter how you manipulate or transform the data to fit into statistical formulas. Additionally, the FDA receives the raw data; it can perform its own analyses and draw its own conclusions, independent of how the developer packages the data and presents the results.

One of the biggest problems in clinical trial design is the introduction of bias. Sackett et al. (1979) identified and categorized 56 different types of bias, including biases emanating from the: clinical investigator (e.g., selection/ expectation bias), study subject (e.g., volunteer bias/recall bias), and analyst or statistician.

Investigator Bias

A meta-analysis of controlled trials showed that failing to conceal treatment and the absence of double blinding yields study results that exaggerate the effect of treatment, usually in a positive direction. Schulz et al. (1995) found that trials that were not double-blinded yielded larger estimates of effects (p = 0.01), with the odds ratio being exaggerated by 17%. And this bias is prevalent even in preclinical studies. A clear example of investigator bias was presented by the Rosenthal et al. (1976) publication on rats with brain lesions. In this study each rat was labeled as either having a lesion or not having a lesion. When the labels correctly identified which rat had brain lesions, the study showed better performance in the rats without lesions. However, when the exact same study was performed with rats that were incorrectly labeled, the study show no difference in the performance related to the presence/absence of brain lesions.

Historical Controls

A meta-analysis by Sacks et al. (1992) of 106 articles focused on six therapeutic questions revealed that bias can arise from the use of historical controls. The study compared the outcomes of trials for the same treatments and found that 79% of clinical trials with a historical control group showed a benefit over control, while only 20% of trials with a direct control group showed a benefit over control. Reasons for the difference in the results include: the response rates of the historical control can change over time due to changes in overall life expectancy and quality of life, changes in diagnostics, and changes in background treatments and supportive care. Additionally, if the historical controls are not treated, the patients will not show the benefits (e.g., placebo effect) seen in placebo-treated controls (Diehl 1986, Moertel 1984).

Inclusion of a Concurrent Control Group

An early meta-analysis of psychiatric studies reviewed the reported treatment efficacy in studies with and without a concurrent control group. The study found that 25% of studies with a control group reported treatment efficacy, vs. 83% of studies without a control group (Foulds 1958).

These results have been mirrored in other studies such as the meta-analysis carried out by Viamontes (1972), which found that 6% of studies with a control group reported treatment efficacy compared to 94% of studies that were not controlled. These results emphasize the need to control the introduction of bias and account for potential placebo effect by concurrently enrolling a control group in studies evaluating effectiveness.

Blinding the Treatment Allocation

Epstein (1996) carried out a review of 11 trials of anti-CD4 monoclonal antibody in the treatment of rheumatoid arthritis. Of these trials, eight were open-label trials (n=129 subjects), all of which demonstrated clinical benefit (27-63% of subjects significantly improved). Of the three trials where treatment was blinded (n=133 subjects), none showed a clinical benefit (0-19% of subjects significantly improved).

Randomized Treatment Assignment

Chalmers et al. (1983) reviewed 145 publications comparing randomized and non-randomized treatment allocation for acute myocardial infarction and the difference in the reported case-fatality rates between treatment and control groups was 8.8% for blinded randomized studies vs. 58.1% of the nonrandomized studies (p<0.05). Carroll et al. (1996) reviewed 19 studies looking at the use of transcutaneous electrical nerve stimulation (TENS) for chronic pain. When they examined the differences in study design, they found treatment efficacy in 12% of randomized studies vs. 89% of non-randomized studies. This reveals the introduction of bias into the results when site personnel know the treatment assignment.

Active Compared with Inert Placebo

“Active placebos” are placebo/comparators in a study that mimic some of the side effects that would be expected on the experimental medication without providing any benefit for the indication being treated. Lindner et al. (2007) compared studies using an active placebo control group with studies using an inert placebo control group and found that the placebo selected in a study could be another source of bias. In a study by Fisher et al. (1964), there were no differences in efficacy noted between active and placebo unless the investigator informed the subject(s) that specific side effects were associated with the active treatment. Shapiro et al. (1997) found that in antidepressant studies using an inert placebo, the antidepressant was effective in 59% of the studies; however, in studies with active placebo (that produces side effects similar to the active treatment), the antidepressant was only effective in 14% of the studies.

The introduction of bias into a study at any level can result in incorrect conclusions, and the examples above specifically illustrate how the incorrect conclusions can easily be influenced by key components of study design.

Designing a Study To Include the Appropriate Population

It is important to remember that clinical trials do not recruit a truly random sample. The sample is usually a sample of convenience, because it only includes subjects that come into a study site, and the subjects have to be available to participate, willing to volunteer, and meet the inclusion/exclusion criteria of the study. Therefore during the development of the study it is critical to be able to recruit from a wide pool of patients and that the inclusion/exclusion criteria truly define the intended patient population (including the full range of disease severity that the investigational product is intended to treat). Once the study population is defined, it is important that the subjects enrolled into the study meet the entry criteria, as deviations have the ability to invalidate the population.

There are many recent examples of completing a study with the wrong patient population, which could have been addressed during the design of the study including:

Dendreon’s Sipleucel-T (Provenge) received an “Approvable” letter from the FDA on May 8, 2007 because the agency found that the study population did not represent the general population with the associated morbidities. This caused delays of at least two years for the approval process of the drug.
Wyeth’s first-in-class antibiotic Tygacil for the treatment of adult patients with community-acquired pneumonia (CAP) received an “Approvable” letter from the FDA in which the agency requested additional analyses of patients with CAP severe enough to require hospitalization, including those at highest risk of mortality was requested.
Isis delayed its new drug application (NDA) of Mipomersen for the treatment of a rare genetic high-cholesterol disorder because the FDA requested a new study in patients with the severe form of the disease.

Selecting Appropriate Endpoints

There are many components of an endpoint, and it is important to take them all into consideration. They include 1) selecting the timepoint of the endpoint, 2) selecting an endpoint that is responsive to the intervention, 3) selecting an endpoint that is not subject to competing risk (aka survivor bias), and/or 4) ensuring that the surrogate is validated.

Surrogate endpoints are often used in clinical trials when the clinical endpoint cannot be measured, or the timeline of observing the clinical endpoint is too long for the clinical trial. In selecting a surrogate endpoint, one must ensure that the intervention directly impacts that surrogate endpoint, and that the surrogate endpoint is directly related to theclinical endpoint. Otherwise the surrogate endpoint will not reflect the impact of the intervention on the clinical endpoint.

An appropriate endpoint can be measured in the patient population with an adequate signal strength to create a “doable” trial. The important number in powering a randomized controlled trial is not simply the number of patients, but the number of outcome events (signal strength) that can be observed in those patients (Sackett 2001).

Sackett (2001) has also carried out a review of why even randomized clinical trials fail, but don’t have to. In his review he outlines examples of three common features in studies that can lead to completely different results if not accounted for correctly during the design of the study: the responsiveness of the patients (both with and without intervention), the compliance of patients (both to treatment and study procedures), and the completeness of the data (outcome events).

The responsiveness of patients – Subjects at higher risk of an event are most likely to experience an event during the study. If a study includes both high and low risk patients, the high-risk patients are more likely to experience a clinically significant outcome than the low-risk patients.

The compliance of patients – Patients with higher compliance are more likely to experience the experimental effect than the patients who are less compliant (if the investigational product is effective). Therefore if a substantial number of patients in the study are non-compliant, they end up reducing the effect size of the study (and in turn increase the p-value) or require an over-enrollment of the study to account for an event rate that is confounded by non-compliance.

Completeness of the data (outcome events) – The most critical data of any study is the primary endpoint (outcome event); if these data are missed, it can change the outcome of the study. If the data are missed equally from both the control arm and the experimental arms, in general this will decrease the absolute risk reduction (thereby increasing the p-value). A bigger effect can be seen if the data are missed unequally between the control and experimental arms in a study. Systematically missing more events in the control arm can even lead to a false negative result.

Key principles of study design always need to be considered, as they may impact the results and interpretation of study results and the approvability of an investigational product.

References

Burnham JR (1966) Experimenter bias and lesion labeling. Purdue University. (unpublished manuscript)

Cardis et al. (2010) The INTERPHONE Study Group: Brain tumour risk in relation to mobile telephone use: results of the INTERPHONE international case-control study. Int. J Epidemiol Jun;39(3):675-94.

Cardis et al. (2007) The INTERPHONE study: design, epidemiological methods, and description of the study population. Eur J Epidemiol 22: 647-64.

Carroll D, Moore RA, McQuay HJ, Fairman F, Tramer M, Leijon G (2001) Transcutaneous electrical nerve stimulation (TENS) for chronic pain. Cochrane Database Syst Rev 3 (CD003222).

Carroll D, Tramer M, McQuay H, Bye B, Moore A (1996) Randomization is important in studies with pain outcomes: Systematic review of transcutaneous electrical nerve stimulation in acute post-operative pain. British Journal of Anaesthesia (77): 798-803.

Chalmers TC, Celano P, Sacks HS, Smith H (1983) Bias in treatment assignment in controlled clinical trials. N Engl J Med 309(22): 1358-61.

Chalmers TC, Matta RJ, Smith H, Kunzler AM (1977) Evidence favoring the use of anticoagulants in the hospital phase

Diehl LF, Perry DJ (1986) A comparison of randomized concurrent control groups with matched historical control groups: are historical controls valid? J Clin Oncol 4(7): 1114-20.

Epstein WV (1996) Expectation bias in rheumatoid arthritis clinical trials. The anti-CD4 monoclonal antibody experience. Arthritis Rheum 39(11): 1773-80.

Fisher S, Cole JO, Rickels K, Uhlenhuth EH (1964) Drug set interaction: the effect of expectations on drug response in outpatients. In Bradley PB, Flugel F, Hoch P. (Eds.) Neuropsychopharmacology New York: Elsevier.

Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med. 1996 Oct 1;125(7):605-13.

Foulds GA (1958) Clinical research in psychiatry. J Ment Sci 104(435): 259-65.

Greenberg RP, Bornstein RF, Greenberg MD, Fisher S (1992) A meta-analysis of antidepressant outcome under “blinder” conditions. J Consult Clin Psychol Oct;60(5):664-9.

Hahn P (2009) Study Design. Queen’s University. meds.queensu.ca/medicine/obgyn/scholar/design.pdf

Lindner MD (2007) Clinical attrition due to biased preclinical assessments of potential efficacy. Pharmacol Ther 115:148-75.

Moertel CG (1984) Improving the efficiency of clinical trials: a medical perspective. Stat Med 3 (4): 455-68.

Rosenthal R (1976) Interpersonal expectancy effects: a follow-up. Experimenter Effects in Behavioral Research: Enlarged Edition. New York: Irvington Publishers, Inc.

Sackett DL (1979) Bias in analytic research. J Chron Dis 32(1-2): 51-63.

Sacks H, Chalmers TC, Smith H (1982) Randomized versus historical controls for clinical trials. Am J Med 72(2): 233-40.

Schulz KF, Chalmers I, Hayes RJ, Altman DG (1995) Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effect. JAMA 273(5): 408-12.

Shapiro AK, Shapiro E (1997) The Powerful Placebo: From Ancient Priest to Modern Physician. Baltimore: The Johns Hopkins University Press.

Thompson WG (2005) The placebo effect and health: combining science and compassionate care. Prometheus Books.

Viamontes JA (1972) Review of drug effectiveness in the treatment of alcoholism. Am J Psychiatry 128(12), 1570-1.

Debbie Wilkerson, Ph.D. is chief scientific officer at OV Clinical Trials. She can be reached at [email protected]. Gizelle Baker, Ph.D. is chief operating officer at OV Clinical Trials. She can be reached at [email protected].